9 research outputs found

    Enzyme classification with peptide programs: a comparative study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length.</p> <p>Results</p> <p>We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets.</p> <p>Conclusion</p> <p>The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required.</p> <p>Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy.</p

    Metrics for GO based protein semantic similarity: a systematic evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p

    Metrics for GO based protein semantic similarity: a systematic evaluation-3

    No full text
    red) and inverse of the number of annotations per protein (in grey) as function of sequence similarity: A - using the LRBS sequence similarity metric; B - using the RRBS metric. There is an evident parallel between the behaviour of the semantic similarity results and the distribution of the inverse of the number of annotations per protein, which becomes more evident for high sequence similarity values. This parallel reflects the inverse proportionality relationship between the average combination approach and the number of annotations per protein.<p><b>Copyright information:</b></p><p>Taken from "Metrics for GO based protein semantic similarity: a systematic evaluation"</p><p>http://www.biomedcentral.com/1471-2105/9/S5/S4</p><p>BMC Bioinformatics 2008;9(Suppl 5):S4-S4.</p><p>Published online 29 Apr 2008</p><p>PMCID:PMC2367622.</p><p></p

    Metrics for GO based protein semantic similarity: a systematic evaluation-5

    No full text
    UI measures: in red - simGIC in the full dataset; in green - simUI in the full dataset; in blue - simGIC in the non-electronic dataset; in violet - simUI in the non-electronic dataset; A - with the LRBS sequence similarity metric; B - with the RRBS metric. Both measures show similar behaviours to those of the term measures with the BMA approach, with simGIC having a higher resolution than simUI and indeed the highest overall resolution of all measures tested.<p><b>Copyright information:</b></p><p>Taken from "Metrics for GO based protein semantic similarity: a systematic evaluation"</p><p>http://www.biomedcentral.com/1471-2105/9/S5/S4</p><p>BMC Bioinformatics 2008;9(Suppl 5):S4-S4.</p><p>Published online 29 Apr 2008</p><p>PMCID:PMC2367622.</p><p></p

    Metrics for GO based protein semantic similarity: a systematic evaluation-2

    No full text
    Oaches to Resnik's measure: maximum (in red), average (in green), BMA (in blue) and BMA + GraSM (in violet). A - in the full dataset with the LRBS sequence similarity metric; B - in the non-electronic dataset with the LRBS metric; C - in the full dataset with the RRBS metric; D - in the non-electronic dataset with the RRBS metric. Modelling curves in A and C were composed of two additive normal cumulative distribution functions, and the curve for the average included also a negative linear component; in B and D, all curves were composed of a single normal function. It is noticeable that while all four approaches exhibit similar behaviour in the non-electronic dataset (B and D), the maximum and particularly the average approach perform poorly in the full dataset (A and C), with the former having a very low resolution and the latter showing a decreasing behaviour for high sequence similarity values. The same behaviours and the same relationships between the approaches were obtained for Lin's and Jiang and Conrath's measures.<p><b>Copyright information:</b></p><p>Taken from "Metrics for GO based protein semantic similarity: a systematic evaluation"</p><p>http://www.biomedcentral.com/1471-2105/9/S5/S4</p><p>BMC Bioinformatics 2008;9(Suppl 5):S4-S4.</p><p>Published online 29 Apr 2008</p><p>PMCID:PMC2367622.</p><p></p

    Evaluation of a quality improvement intervention to reduce anastomotic leak following right colectomy (EAGLE): pragmatic, batched stepped-wedge, cluster-randomized trial in 64 countries

    No full text
    Background Anastomotic leak affects 8 per cent of patients after right colectomy with a 10-fold increased risk of postoperative death. The EAGLE study aimed to develop and test whether an international, standardized quality improvement intervention could reduce anastomotic leaks. Methods The internationally intended protocol, iteratively co-developed by a multistage Delphi process, comprised an online educational module introducing risk stratification, an intraoperative checklist, and harmonized surgical techniques. Clusters (hospital teams) were randomized to one of three arms with varied sequences of intervention/data collection by a derived stepped-wedge batch design (at least 18 hospital teams per batch). Patients were blinded to the study allocation. Low- and middle-income country enrolment was encouraged. The primary outcome (assessed by intention to treat) was anastomotic leak rate, and subgroup analyses by module completion (at least 80 per cent of surgeons, high engagement; less than 50 per cent, low engagement) were preplanned. Results A total 355 hospital teams registered, with 332 from 64 countries (39.2 per cent low and middle income) included in the final analysis. The online modules were completed by half of the surgeons (2143 of 4411). The primary analysis included 3039 of the 3268 patients recruited (206 patients had no anastomosis and 23 were lost to follow-up), with anastomotic leaks arising before and after the intervention in 10.1 and 9.6 per cent respectively (adjusted OR 0.87, 95 per cent c.i. 0.59 to 1.30; P = 0.498). The proportion of surgeons completing the educational modules was an influence: the leak rate decreased from 12.2 per cent (61 of 500) before intervention to 5.1 per cent (24 of 473) after intervention in high-engagement centres (adjusted OR 0.36, 0.20 to 0.64; P &lt; 0.001), but this was not observed in low-engagement hospitals (8.3 per cent (59 of 714) and 13.8 per cent (61 of 443) respectively; adjusted OR 2.09, 1.31 to 3.31). Conclusion Completion of globally available digital training by engaged teams can alter anastomotic leak rates. Registration number: NCT04270721 (http://www.clinicaltrials.gov)
    corecore